{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1 深度学习与深层神经网络\n",
    "**深度学习：机器学习的分支，是一种试图使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的算法。--[wikipedia](https://zh.wikipedia.org/zh-hans/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0)**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.1 线性模型的局限性\n",
    "**线性模型最大特点是任意线性模型的组合仍然还是线性模型**，因此，只通过线性变换，任意层的全连接神经网路和单层神经网络的表达能力没有区别，而且他们都是线性的，而线性模型能够解决的问题时有限的。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.2 激活函数实现去线性化\n",
    "<p align='center'>\n",
    "    <img src=images/图4.5.JPG>\n",
    "    <img src=images/图4.7.JPG>\n",
    "</p>\n",
    "目前TensorFlow提供了7种不同的非线性激活函数，tf.nn.relu、tf.sigmoid、tf.tanh是其中比较常用的几个。\n",
    "\n",
    "未添加偏置和非线性激活函数的前向传播（3.4.3节）：\n",
    "\n",
    "`a = tf.matmul(x, w1)\n",
    "y = tf.matmul(a, w2)`\n",
    "\n",
    "添加偏置和非线性激活函数的前向传播：\n",
    "\n",
    "`a = tf.nn.relu(tf.matmul(x, w1) + biases1)\n",
    "y = tf.nn.relu(tf.matmul(a, w2) + biases2)`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用tensorflow的[playground](http://playground.tensorflow.org/)进行实验，线性激活函数和非线性激活函数的结果分别如下图：\n",
    "<p align='center'>\n",
    "    <img src=images/playground1.JPG>\n",
    "    <img src=images/playground2.JPG>\n",
    "</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.3 多层网络解决异或问题\n",
    "1943年，Warren McCulloch和Walter Pitts提出了最早的神经网络的理论模型；1958年Frank Rosenblatt提出了感知机模型（可以简单理解为单层的神经网络），在数学上完成了对神经网络的精确建模；1969年，Marvin Minsky和Seymour Papert提出感知机无法解决异或问题。\n",
    "\n",
    "**但是在神经网络中加入隐藏层后，异或问题可以得到很好的解决**，playground上实验结果分别如下图：\n",
    "<p align='center'>\n",
    "    <img src=images/playground3.JPG>\n",
    "    <img src=images/playground4.JPG>\n",
    "</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2 损失函数定义\n",
    "### 4.2.1 经典损失函数\n",
    "**分类问题和回归问题是监督学习的两大种类**。\n",
    "\n",
    "先讨论**分类问题**，其希望解决的是将不同的样本分到事先定义好的类别中。解决分类问题的神经网络的输出一般为一个n维数组，形如[0,1,0,0,0,0]表示属于理想情况下的第二类。\n",
    "\n",
    "交叉熵（cross entropy）是信息论中的一个概念，原本是用来评估计算编码长度的，给定两个概率分布（任意事件发生的概率都在0-1之间，且概率和为1）p和q，通过q来表示p的交叉熵为：\n",
    "\n",
    "$$ H(p, q) = -\\sum_{x}p(x)logq(x) $$\n",
    "\n",
    "交叉熵刻画的是两个概率分布之间距离，越小表示越接近。然而神经网络的输出并不一定是一个概率分布，这就需要用到如下的softmat函数，负责将神经网络的输出转换成概率分布：\n",
    "\n",
    "$$ softmax(y)_{i} = \\hat{y_{i}} = \\frac{e^{y_i}}{\\sum_{j=1}^n{e^{y_i}}} $$\n",
    "\n",
    "3.4.5节中已经通过TF实现过交叉熵：\n",
    "\n",
    "`cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)) + \n",
    "(1 - y_) * tf.log(tf.clip_by_value(1 - y, 1e-10, 1.0)))`\n",
    "\n",
    "y_表示正确结果，y表示预测结果,其中分为四个运算步骤："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[2.5 2.5 3. ]\n",
      " [4.  4.5 4.5]]\n",
      "[[0.        0.6931472 1.0986123]\n",
      " [1.3862944 1.609438  1.7917595]]\n",
      "[[ 2.  2.]\n",
      " [20. 20.]]\n",
      "[[12.  9.]\n",
      " [33. 24.]]\n",
      "3.5\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "v1 = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])\n",
    "v2 = tf.constant([[1.0, 2.0], [4.0, 5.0]])\n",
    "v3 = tf.constant([[2.0, 1.0], [5.0, 4.0]])\n",
    "sess = tf.InteractiveSession()\n",
    "\n",
    "# 1. tf.clip_by_value函数将一个张量中的数值限制在一定范围，这样可以避免运算错误，如log0\n",
    "print(tf.clip_by_value(v1, 2.5, 4.5).eval())\n",
    "\n",
    "# 2. tf.log函数完成对张量中所有元素依次求对数\n",
    "print(tf.log(v1).eval())\n",
    "\n",
    "# 3. * 表示各元素之间直接相乘，对比tf.matmul表示矩阵乘法\n",
    "print((v2 * v3).eval())\n",
    "print(tf.matmul(v2, v3).eval())\n",
    "\n",
    "# 4. tf.reduce_mean计算均值\n",
    "print(tf.reduce_mean(v1).eval())\n",
    "sess.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "由于交叉熵一般会与softmax回归一起使用，所以TF使用tf.nn.softmax_cross_entropy_with_logits函数对二者进行了封装：\n",
    "\n",
    "`cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)`\n",
    "\n",
    "对于**回归问题**，主要解决的是对具体数值的预测，解决回归的神经网络一般只有一个输出，即预测值。最常用的损失函数是均方误差（MSE，mean squared error）：\n",
    "\n",
    "$$ MSE(y, \\hat{y}) = \\frac{\\sum_{i=1}^n{(y_i - \\hat{y_i})^2}}{n} $$\n",
    "\n",
    "其中$y_i$为一个batch中第i个数据正确答案，$\\hat{y_i}$为神经网络给出的预测值，mse在TF中的实现为：\n",
    "\n",
    "`mse = tf.reduce_mean(tf.square(y_ - y))`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.2 自定义损失函数\n",
    "\n",
    "这里以预测商品销量问题为例。如果预测多了，商家损失的是商品的成本（设为1）；如果预测少了，损失的是商品的利润（设为10）。如果神经网络最小化的是均方差，那么很有可能此模型无法最大化预期的利润。为了最大化利润，应该在这两种情况下设置不同的损失系数：\n",
    "\n",
    "`loss = tf.reduce_mean(tf.where(tf.greater(v1, v2), (v1 - v2) * a, (v2 - v1) * b))`\n",
    "\n",
    "上式中通过tf.greater和tf.where实现了选择功能：\n",
    "- tf.greater会比较输入的两个张量中每个元素的值（支持broadcastin操作），单个元素比较结果为True/False；\n",
    "- tf.where有三个参数，第一个为True时会选择第二个参数的值，否则选择第三个。\n",
    "\n",
    "这两个函数用法参见如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[False False  True  True]\n",
      "[4. 3. 3. 4.]\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "v1 = tf.constant([1.0, 2.0, 3.0, 4.0])\n",
    "v2 = tf.constant([4.0, 3.0, 2.0, 1.0])\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    print(sess.run(tf.greater(v1, v2)))\n",
    "    print(sess.run(tf.where(tf.greater(v1, v2), v1, v2)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面通过一个简单的神经网络程序来介绍损失函数对模型的影响。这个程序中，有两个输入节点，一个输出节点，没有隐藏层，主体程序和3.5.4节中基本一致："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "After 0 training step(s), w1 is: \n",
      "[[-0.81031823]\n",
      " [ 1.4855988 ]] \n",
      "\n",
      "After 1000 training step(s), w1 is: \n",
      "[[0.01247114]\n",
      " [2.138545  ]] \n",
      "\n",
      "After 2000 training step(s), w1 is: \n",
      "[[0.45567423]\n",
      " [2.1706069 ]] \n",
      "\n",
      "After 3000 training step(s), w1 is: \n",
      "[[0.69968736]\n",
      " [1.846531  ]] \n",
      "\n",
      "After 4000 training step(s), w1 is: \n",
      "[[0.8988668]\n",
      " [1.2973604]] \n",
      "\n",
      "Final w1 is: \n",
      " [[1.0193471]\n",
      " [1.0428091]]\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "from numpy.random import RandomState\n",
    "\n",
    "# 生成模拟数据集，这里设置噪音为-0.05~0.05的随机数\n",
    "rdm = RandomState(1)\n",
    "X = rdm.rand(128, 2)\n",
    "Y = [[x1 + x2 + (rdm.rand()/10.0-0.05)] for (x1, x2) in X]\n",
    "\n",
    "# 1. 定义神经网络的参数、输入输出节点。\n",
    "BATCH_SIZE = 8\n",
    "STEPS = 5000\n",
    "\n",
    "w1= tf.Variable(tf.random_normal([2, 1], stddev=1, seed=1))\n",
    "\n",
    "x = tf.placeholder(tf.float32, shape=(None, 2), name=\"x-input\")\n",
    "y_ = tf.placeholder(tf.float32, shape=(None, 1), name='y-input')\n",
    "\n",
    "# 2. 定义前向传播、损失函数和反向传播。\n",
    "y = tf.matmul(x, w1)\n",
    "\n",
    "loss_less = 10\n",
    "loss_more = 1\n",
    "loss = tf.reduce_sum(tf.where(tf.greater(y, y_), (y - y_) * loss_more, (y_ - y) * loss_less))\n",
    "\n",
    "train_step = tf.train.AdamOptimizer(0.001).minimize(loss)\n",
    "\n",
    "# 3. 训练模型\n",
    "with tf.Session() as sess:\n",
    "    init_op = tf.global_variables_initializer()\n",
    "    sess.run(init_op)\n",
    "    for i in range(STEPS):\n",
    "        start = (i * BATCH_SIZE) % 128\n",
    "        end = (i * BATCH_SIZE) % 128 + BATCH_SIZE\n",
    "        sess.run(train_step, feed_dict={x: X[start:end], y_: Y[start:end]})\n",
    "        if i % 1000 == 0:\n",
    "            print(\"After %d training step(s), w1 is: \" % (i))\n",
    "            print(sess.run(w1), \"\\n\")\n",
    "    print(\"Final w1 is: \\n\", sess.run(w1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**上个cell中定义损失函数时使得预测少了的损失大，于是模型应该偏向多的方向预测。**\n",
    "\n",
    "**下面重新定义损失函数，使得预测多了的损失大，于是模型应该偏向少的方向预测。**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "After 0 training step(s), w1 is: \n",
      "[[-0.8123182]\n",
      " [ 1.4835987]] \n",
      "\n",
      "After 1000 training step(s), w1 is: \n",
      "[[0.18643522]\n",
      " [1.0739335 ]] \n",
      "\n",
      "After 2000 training step(s), w1 is: \n",
      "[[0.95444274]\n",
      " [0.9808863 ]] \n",
      "\n",
      "After 3000 training step(s), w1 is: \n",
      "[[0.9557403]\n",
      " [0.9806634]] \n",
      "\n",
      "After 4000 training step(s), w1 is: \n",
      "[[0.95466024]\n",
      " [0.9813524 ]] \n",
      "\n",
      "Final w1 is: \n",
      " [[0.9556111]\n",
      " [0.9810191]]\n"
     ]
    }
   ],
   "source": [
    "loss_less = 1\n",
    "loss_more = 10\n",
    "loss = tf.reduce_sum(tf.where(tf.greater(y, y_), (y - y_) * loss_more, (y_ - y) * loss_less))\n",
    "train_step = tf.train.AdamOptimizer(0.001).minimize(loss)\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    init_op = tf.global_variables_initializer()\n",
    "    sess.run(init_op)\n",
    "    for i in range(STEPS):\n",
    "        start = (i * BATCH_SIZE) % 128\n",
    "        end = (i * BATCH_SIZE) % 128 + BATCH_SIZE\n",
    "        sess.run(train_step, feed_dict={x: X[start:end], y_: Y[start:end]})\n",
    "        if i % 1000 == 0:\n",
    "            print(\"After %d training step(s), w1 is: \" % (i))\n",
    "            print(sess.run(w1), \"\\n\")\n",
    "    print(\"Final w1 is: \\n\", sess.run(w1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**最后比较一下定义损失函数为MSE的情形。**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "After 0 training step(s), w1 is: \n",
      "[[-0.81031823]\n",
      " [ 1.4855988 ]] \n",
      "\n",
      "After 1000 training step(s), w1 is: \n",
      "[[-0.1333761]\n",
      " [ 1.8130922]] \n",
      "\n",
      "After 2000 training step(s), w1 is: \n",
      "[[0.32190308]\n",
      " [1.5246348 ]] \n",
      "\n",
      "After 3000 training step(s), w1 is: \n",
      "[[0.6785022]\n",
      " [1.2529727]] \n",
      "\n",
      "After 4000 training step(s), w1 is: \n",
      "[[0.8947401]\n",
      " [1.0859822]] \n",
      "\n",
      "Final w1 is: \n",
      " [[0.9743756]\n",
      " [1.0243336]]\n"
     ]
    }
   ],
   "source": [
    "loss = tf.losses.mean_squared_error(y, y_)\n",
    "train_step = tf.train.AdamOptimizer(0.001).minimize(loss)\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    init_op = tf.global_variables_initializer()\n",
    "    sess.run(init_op)\n",
    "    for i in range(STEPS):\n",
    "        start = (i * BATCH_SIZE) % 128\n",
    "        end = (i * BATCH_SIZE) % 128 + BATCH_SIZE\n",
    "        sess.run(train_step, feed_dict={x: X[start:end], y_: Y[start:end]})\n",
    "        if i % 1000 == 0:\n",
    "            print(\"After %d training step(s), w1 is: \" % (i))\n",
    "            print(sess.run(w1), \"\\n\")\n",
    "    print(\"Final w1 is: \\n\", sess.run(w1))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
